inference speed
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
FastSpeech: Fast, Robust and Controllable Text to Speech
Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Prominent methods (e.g., Tacotron 2)usuallyfirst generate mel-spectrogram from text, and then synthesize speech from themel-spectrogram using vocoder such as WaveNet. Compared with traditionalconcatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech isusually not robust (i.e., some words are skipped or repeated) and lack of con-trollability (voice speed or prosody control).
- Asia > China (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
training
RTFormer is consist of several convolution blocks and RTFormerblocks,andRTFormerblockcontains differenttypes of attention. Table 2 shows the performance of RTFormer on ImageNet classification. The first three results of multi-head external attention are with r = [0.125,0.25,1]respectively. As illustrated in Table 3, we can find that multi-head self-attention achieves32.7 mIoU, which performs better than multi-head external attentions with different settings ofr. Multi-head external attention can achieve a good inference speed, which is benefit from its linear complexity and the design of sharing external parameter for multiple heads. However,theperformance ofmulti-headexternal attention is suboptimal, as the network capacity is limited by those designs.